智能论文笔记

Improving Robust Fairness via Balance Adversarial Training

Chunyu Sun , Chenye Xu , Chengyuan Yao , Siyuan Liang , Yichao Wu , Ding Liang , XiangLong Liu , Aishan Liu

分类：机器学习 | 人工智能

2022-09-15

对抗训练（AT）方法有效地防止对抗性攻击，但它们在不同阶级之间引入了严重的准确性和鲁棒性差异，称为强大的公平性问题。以前建议的公平健壮的学习（FRL）适应重新重量不同的类别以提高公平性。但是，表现良好的班级的表现降低了，导致表现强劲。在本文中，我们在对抗训练中观察到了两种不公平现象：在产生每个类别的对抗性示例（源级公平）和产生对抗性示例时（目标级公平）时产生对抗性示例的不同困难。从观察结果中，我们提出平衡对抗训练（BAT）来解决强大的公平问题。关于源阶级的公平性，我们调整了每个班级的攻击强度和困难，以在决策边界附近生成样本，以便更容易，更公平的模型学习；考虑到目标级公平，通过引入统一的分布约束，我们鼓励每个班级的对抗性示例生成过程都有公平的趋势。在多个数据集（CIFAR-10，CIFAR-100和IMAGENETTE）上进行的广泛实验表明，我们的方法可以显着超过其他基线，以减轻健壮的公平性问题（最坏的类精度为+5-10 \％）

translated by 谷歌翻译

Semantic Segmentation-Assisted Instance Feature Fusion for Multi-Level 3D Part Instance Segmentation

Chunyu Sun , Xin Tong , Yang Liu

分类：计算机视觉

2022-08-09

从3D点云中识别3D零件实例对于3D结构和场景理解至关重要。几种基于学习的方法使用语义细分和实例中心预测作为培训任务，并且无法进一步利用形状语义和部分实例之间的固有关系。在本文中，我们提出了一种用于3D份实例分割的新方法。我们的方法将语义分割利用为融合非本地实例特征（例如中心预测），并以多种和跨层次的方式进一步增强了融合方案。我们还提出了一个语义区域中心预测任务，以训练和利用预测结果来改善实例点的聚类。我们的方法优于现有方法，在Partnet基准测试方面有大幅度的改进。我们还证明，我们的功能融合方案可以应用于其他现有方法，以提高其在室内场景实例细分任务中的性能。

translated by 谷歌翻译

WPPG Net: A Non-contact Video Based Heart Rate Extraction Network Framework with Compatible Training Capability

Weiyu Sun , Xinyu Zhang , Ying Chen , Yun Ge , Chunyu Ji , Xiaolin Huang

分类：计算机视觉

2022-07-04

我们的面部皮肤呈现出细微的色彩变化，称为远程光绘画（RPPG）信号，我们可以从中提取受试者的心率。最近，提出了许多有关RPPG信号提取的深度学习方法和相关数据集。但是，由于耗时血液流过我们的身体和其他因素，标签波（例如BVP信号）在某些数据集中具有实际RPPG信号的不确定延迟，这导致难以训练网络的训练，这些网络直接预测了RPPG波。在本文中，通过分析RPPG信号和标签波的节奏和周期性的共同特征，我们提出了一组包裹这些网络的训练方法，以便在在数据集中频繁地延迟数据的情况下进行训练时可以保持有效的效率。与其他无延迟RPPG提取方法相比，获得更精确和健壮的心率预测结果。

translated by 谷歌翻译

Disjoint Masking with Joint Distillation for Efficient Masked Image Modeling

Xin Ma , Chang Liu , Chunyu Xie , Long Ye , Yafeng Deng , Xiangyang Ji

分类：计算机视觉

2022-12-31

Masked image modeling (MIM) has shown great promise for self-supervised learning (SSL) yet been criticized for learning inefficiency. We believe the insufficient utilization of training signals should be responsible. To alleviate this issue, we introduce a conceptually simple yet learning-efficient MIM training scheme, termed Disjoint Masking with Joint Distillation (DMJD). For disjoint masking (DM), we sequentially sample multiple masked views per image in a mini-batch with the disjoint regulation to raise the usage of tokens for reconstruction in each image while keeping the masking rate of each view. For joint distillation (JD), we adopt a dual branch architecture to respectively predict invisible (masked) and visible (unmasked) tokens with superior learning targets. Rooting in orthogonal perspectives for training efficiency improvement, DM and JD cooperatively accelerate the training convergence yet not sacrificing the model generalization ability. Concretely, DM can train ViT with half of the effective training epochs (3.7 times less time-consuming) to report competitive performance. With JD, our DMJD clearly improves the linear probing classification accuracy over ConvMAE by 5.8%. On fine-grained downstream tasks like semantic segmentation, object detection, etc., our DMJD also presents superior generalization compared with state-of-the-art SSL methods. The code and model will be made public at https://github.com/mx-mark/DMJD.

translated by 谷歌翻译

Style-Label-Free: Cross-Speaker Style Transfer by Quantized VAE and Speaker-wise Normalization in Speech Synthesis

Chunyu Qiang , Peng Yang , Hao Che , Xiaorui Wang , Zhongyuan Wang

分类：人工智能 | 自然语言处理

2022-12-13

Cross-speaker style transfer in speech synthesis aims at transferring a style from source speaker to synthesised speech of a target speaker's timbre. Most previous approaches rely on data with style labels, but manually-annotated labels are expensive and not always reliable. In response to this problem, we propose Style-Label-Free, a cross-speaker style transfer method, which can realize the style transfer from source speaker to target speaker without style labels. Firstly, a reference encoder structure based on quantized variational autoencoder (Q-VAE) and style bottleneck is designed to extract discrete style representations. Secondly, a speaker-wise batch normalization layer is proposed to reduce the source speaker leakage. In order to improve the style extraction ability of the reference encoder, a style invariant and contrastive data augmentation method is proposed. Experimental results show that the method outperforms the baseline. We provide a website with audio samples.

translated by 谷歌翻译

Interdisciplinary Discovery of Nanomaterials Based on Convolutional Neural Networks

Tong Xie , Yuwei Wan , Weijian Li , Qingyuan Linghu , Shaozhou Wang , Yalun Cai , Han Liu , Chunyu Kit , Clara Grazian , Bram Hoex

分类：机器学习

2022-12-06

The material science literature contains up-to-date and comprehensive scientific knowledge of materials. However, their content is unstructured and diverse, resulting in a significant gap in providing sufficient information for material design and synthesis. To this end, we used natural language processing (NLP) and computer vision (CV) techniques based on convolutional neural networks (CNN) to discover valuable experimental-based information about nanomaterials and synthesis methods in energy-material-related publications. Our first system, TextMaster, extracts opinions from texts and classifies them into challenges and opportunities, achieving 94% and 92% accuracy, respectively. Our second system, GraphMaster, realizes data extraction of tables and figures from publications with 98.3\% classification accuracy and 4.3% data extraction mean square error. Our results show that these systems could assess the suitability of materials for a certain application by evaluation of synthesis insights and case analysis with detailed references. This work offers a fresh perspective on mining knowledge from scientific literature, providing a wide swatch to accelerate nanomaterial research through CNN.

translated by 谷歌翻译

Robust Multi-Object Tracking by Marginal Inference

Yifu Zhang , Chunyu Wang , Xinggang Wang , Wenjun Zeng , Wenyu Liu

分类：计算机视觉

2022-08-07

视频中的多目标跟踪需要解决相邻帧中对象之间一对一分配的基本问题。大多数方法通过首先丢弃不可能的对距离大于阈值的不可能对解决问题，然后使用匈牙利算法将对象链接起来以最大程度地减少整体距离。但是，我们发现从重新ID特征计算出的距离的分布可能在不同的视频中有很大差异。因此，没有一个最佳阈值可以使我们安全丢弃不可能的对。为了解决该问题，我们提出了一种有效的方法来实时计算每对对象的边际概率。边际概率可以视为标准化距离，比原始特征距离明显稳定。结果，我们可以为所有视频使用一个阈值。该方法是一般的，可以应用于现有的跟踪器，以在IDF1度量方面获得大约一个点改进。它在MOT17和MOT20基准上取得了竞争成果。此外，计算的概率更容易解释，从而有助于后续后期处理操作。

translated by 谷歌翻译

Neural Contourlet Network for Monocular 360 Depth Estimation

Zhijie Shen , Chunyu Lin , Lang Nie , Kang Liao , Yao Zhao

分类：计算机视觉

2022-08-03

对于单眼360图像，深度估计是一个具有挑战性的，因为失真沿纬度增加。为了感知失真，现有方法致力于设计深层且复杂的网络体系结构。在本文中，我们提供了一种新的观点，该视角为360图像构建了可解释且稀疏的表示形式。考虑到几何结构在深度估计中的重要性，我们利用Contourlet变换来捕获光谱域中的显式几何提示，并将其与空间域中的隐含提示集成在一起。具体而言，我们提出了一个由卷积神经网络和Contourlet变换分支组成的神经轮廓网络。在编码器阶段，我们设计了一个空间光谱融合模块，以有效融合两种类型的提示。与编码器相反，我们采用了逆向方形变换，并通过学习的低通子带和带通道的定向子带来构成解码器中的深度。在三个流行的全景图像数据集上进行的实验表明，所提出的方法的表现优于最先进的方案，其收敛速度更快。代码可在https://github.com/zhijieshen-bjtu/neural-contourlet-network-for-mode上找到。

translated by 谷歌翻译

One-Shot Medical Landmark Localization by Edge-Guided Transform and Noisy Landmark Refinement

Zihao Yin , Ping Gong , Chunyu Wang , Yizhou Yu , Yizhou Wang

分类：计算机视觉

2022-07-31

作为许多医疗应用的重要上游任务，监督的地标本地化仍然需要不可忽略的注释成本才能实现理想的绩效。此外，由于繁琐的收集程序，医疗地标数据集的规模有限，会影响大规模自我监督的预训练方法的有效性。为了应对这些挑战，我们提出了一个两阶段的单次医疗地标本地化框架，该框架首先通过无监督的注册从标记的示例中删除了地标，以便未标记的目标，然后利用这些嘈杂的伪标签来训练健壮的探测器。为了处理重要的结构变化，我们在包含边缘信息的新型损失函数的指导下学习了全球对齐和局部变形的端到端级联。在第二阶段，我们探索了选择可靠的伪标签和半监视学习的跨矛盾的自持矛盾。我们的方法在不同身体部位的公共数据集上实现了最先进的表现，这证明了其一般适用性。

translated by 谷歌翻译

Faster VoxelPose: Real-time 3D Human Pose Estimation by Orthographic Projection

Hang Ye , Wentao Zhu , Chunyu Wang , Rujie Wu , Yizhou Wang

分类：计算机视觉

2022-07-22

尽管基于体素的方法已经获得了来自多摄像头的多人3D姿势估计的有希望的结果，但它们具有沉重的计算负担，尤其是对于大型场景。我们提出了更快的素素，以通过将特征体积重新投影到三个二维坐标平面并分别估算x，y，z坐标来解决挑战。为此，我们首先通过分别基于投影到XY平面和Z轴的体积功能来估算2D框及其高度，首先通过一个3D边界框来定位每个人。然后，对于每个人，我们分别估算三个坐标平面的部分关节坐标，然后将其融合以获得最终的3D姿势。该方法不含昂贵的3D-CNN，并将其素的速度提高了十倍，同时作为最先进的方法的竞争精度，证明了其在实时应用中的潜力。

translated by 谷歌翻译